Import modules and establish connection to database

Import packages for handling, plotting and searching data. Set notebook options.

Packages needed: pandas, seaborn, numpy, matplotlib, pymongo, networkx, sklearn, graphviz, pydotplus

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from bson.objectid import ObjectId

%matplotlib inline
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)

sns.set_context("notebook", font_scale=1.25, rc={"lines.linewidth": 2})

Import pymongo to access database. Set up connection with given credentials to database:

In [2]:
from pymongo import MongoClient
import pprint

client = MongoClient("141.5.113.177:27017")
client.smartshark_test.authenticate('group5', '5wSPez4h', mechanism='SCRAM-SHA-1')
db = client.smartshark_test

Basic commands

All collections in the database (see http://smartshark2.informatik.uni-goettingen.de/documentation/ for more information).

In [3]:
print(db.collection_names())
['vcs_system', 'issue_system', 'code_entity_state', 'code_group_state', 'event', 'file', 'commit', 'file_action', 'plugin_schema', 'issue', 'project', 'issue_comment', 'tag', 'hunk', 'people', 'mailing_list', 'message', 'clone_instance']

Access a specific colletion get all items and store them in a pandas dataframe:

In [4]:
#pd.DataFrame(list(db.file_action.find()))

Search for specific ObjectId in people collection:

In [5]:
list(db.people.find({'_id': ObjectId('5853eb373ee1b95d618826f0')}))
Out[5]:
[{'_id': ObjectId('5853eb373ee1b95d618826f0'),
  'email': 'henry@apache.org',
  'name': 'Henry Robinson',
  'username': 'henry'}]

Commit times during project

The author_date and commit_date are the same, so there is no distinction between time of written code and time of commit. The offset of the time is nearly always zero and will not be taken into account here.

In [6]:
from matplotlib import dates

commits = db.commit
dates_commits = list(commits.find({},{'author_date':1, '_id':0}))
dates_commits = [d['author_date'] for d in dates_commits]

commits_weekdays = [d.strftime("%A") for d in dates_commits]

dates_commits = dates.date2num(dates_commits)
times = (dates_commits+0.66) % 1 + int(dates_commits[0]) 
In [7]:
# Plot time of commits for each day
fig = plt.figure(figsize = (10,6))
ax = fig.add_subplot(111)
ax.yaxis_date()
plt.plot_date(dates_commits, times, 'r.')
plt.title ('Time during day of commits',fontsize=24)
plt.xlabel('Day of commit',             fontsize=20)
plt.ylabel('Time of day',               fontsize=20)
Out[7]:
<matplotlib.text.Text at 0x1152990f0>

Number of code lines added during project

In [8]:
# Load commits and lines added (from file_actions) into DataFrames
df_lines_added = pd.DataFrame(list(db.file_action.find({},{'lines_added':1, 'commit_id':1, '_id':0})))
df_commit_time = pd.DataFrame(list(db.commit.find({},{'author_date':1})))
In [9]:
# Join file_actions with commits on commit_id to get timestamps for added lines.
df_lines_date = pd.merge(df_lines_added, df_commit_time, how='inner', left_on='commit_id', right_on='_id', left_index=False, right_index=False)[['lines_added','author_date']]
# Sort added_lines by date
df_lines_date = df_lines_date.sort_values('author_date').reset_index()

After quick initial development, the number of added code lines decreases with time. Two jumps are clearly visible -> Data inconsistency? Duplicated entries? Or just jumps because some bigger testpiece was incorporated into the codebase.

In [10]:
# Plot cumulative sum of added_lines to see project growth over time
fig = plt.figure(figsize=(10,6))
ax = plt.plot(df_lines_date['author_date'],np.cumsum(df_lines_date['lines_added']))
plt.title('Growth of project', fontsize=24); 
plt.xlabel('Project time'    , fontsize=20); 
plt.ylabel('Lines of code'   , fontsize=20);
fig.savefig('tex/fig/lines_of_code.png')
In [11]:
# Added lines of code for each file_action
#fig = plt.figure(figsize=(12,8))
#ax = plt.plot(df_lines_date['author_date'],df_lines_date['lines_added'], '.')
#plt.title('Growth of project'); plt.xlabel('Project time'); plt.ylabel('Lines of code')

Clustering people w.r.t. work intensity

Aggregate number of commits, number of added and deleted code lines from different commits and file_actions. Aggregate these for people.

In [12]:
df_people = pd.DataFrame.from_dict(list(db.people.find({},{'email':0, 'username':0})))
df_commits = pd.DataFrame.from_dict(list(db.commit.find({},{'author_id':1, '_id':1})))
df_lines = pd.DataFrame.from_dict(list(db.file_action.find({},{'lines_added':1, 'lines_deleted': 1, 'commit_id':1, '_id':0}))).groupby('commit_id', as_index = False).sum()

Merge datasets and group by author_ids to get sum of lines and commits:

In [13]:
df_commit_lines = pd.merge(df_commits, df_lines, left_on='_id', right_on='commit_id')
df_commit_lines['num_commits'] = pd.Series(np.ones(len(df_commit_lines['author_id'])), index=df_commit_lines.index)
df_commit_lines = df_commit_lines.groupby('author_id', as_index=False).sum()

Clean up names, remove entry with no name, change duplicated entries with different names to same name:

In [14]:
df_people_all = pd.merge(df_commit_lines, df_people, left_on='author_id', right_on='_id').drop(['author_id', '_id'], axis=1)

people_dict = {"Raúl Gutiérrez Segalés":"Raul Segales", 
               "Raul Gutierrez Segales": "Raul Segales", 
               "Raul Gutierrez S":"Raul Segales",
               "Patrick D. Hunt": "Patrick Hunt",
               "fpj":"Flavio Paiva Junqueira"}

df_people_all.replace(people_dict, inplace=True)
df_people_all = df_people_all.groupby('name', as_index = False).sum()
df_people_all = df_people_all.drop(0)
df_people_all.set_index('name', inplace=True)

Scatter number of commits against number of code lines to see important people for the project (productiveness).

In [15]:
ax = df_people_all.plot.scatter('lines_added', 'num_commits', 600, figsize=(24,12))
ax.tick_params(axis='both', which='major', labelsize=25)
plt.ylabel("Number commits",             fontsize=40)
plt.xlabel("Number lines of code",       fontsize=40)
plt.title ("Productiveness of people",   fontsize=48)
#plt.savefig('tex/fig/people.png', transparent = False)
Out[15]:
<matplotlib.text.Text at 0x10438f978>

Cluster people with respect to their productiveness.

In [16]:
from sklearn.cluster import AgglomerativeClustering, KMeans, DBSCAN
In [17]:
df_people_normalised = df_people_all.apply(lambda x: (x - np.mean(x)) / (np.max(x) - np.min(x)))
C = AgglomerativeClustering(n_clusters=2, linkage='ward')
C.fit(df_people_normalised[['lines_added', 'num_commits']])
colorMapping = {1: 'green', 0:'red'}
a = [colorMapping[d] for d in C.labels_]
In [18]:
ax = df_people_all.plot.scatter('lines_added', 'num_commits', 600, a, figsize=(24,12))
ax.tick_params(axis='both', which='major', labelsize=25)
plt.ylabel("Number commits", fontsize=40)
plt.xlabel("Number lines of code", fontsize=40)
plt.title("Productiveness of people", fontsize=48)
#plt.savefig('tex/fig/people_clustered.png', transparent = False)
Out[18]:
<matplotlib.text.Text at 0x118d575c0>
In [19]:
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
In [20]:
Z = linkage(df_people_normalised, 'ward')
fig = plt.figure(figsize=(14, 9))
plt.title('Clustering people with similar contribution', fontsize=24)
plt.ylabel('Difference in contribution',               fontsize=20)
dendrogram(
    Z,
    leaf_rotation=90.,  # rotates the x axis labels
    leaf_font_size=20.,  # font size for the x axis labels
    labels=df_people_all.index,
)
plt.tight_layout()
#plt.savefig('tex/fig/dendrogram.png', transparent = True)

Graph analysis of project files

In [21]:
# Import modules for graph analysis of project files
import graphviz as gv
import networkx as nx
import pydotplus
from networkx.drawing.nx_agraph import graphviz_layout

We will only look at the folder src in the project. Other folders and also probably older hierarchies exist and can be seen by changing the commented line below:

In [22]:
# Load file collection into variable with only the file path field
files = list(db.file.find({},{'_id':0, 'path':1}))

# Create directed graph that shows file hierarchy (all folders and no files)
G = nx.DiGraph()
for path in files:
    s = path['path'].split('/')
    if s[0] == 'src':       #----- Change to 'zookeeper' , 'docs' or other to see further folder structures
        i = 0
        currentPath = ""
        for folder in s[:-1]:
            oldPath = currentPath
            currentPath += "/" + folder
            if (i == 0):
                G.add_edge('root',currentPath)
            else:
                G.add_edge(oldPath, currentPath)
            i += 1

# Create node labels as folder name
node_dict = {}
for entry in G.nodes():
    s = entry.split('/')
    node_dict[entry] = s[-1]
In [23]:
plt.figure(figsize=(50,50))
nx.draw_networkx(G, node_size=3000, labels=node_dict)#, node_color=node_colors)
plt.savefig('tex/fig/graph_complete_nx.png', transparent = True)
In [24]:
# Plot graph with labels as tree structure and save to png
plt.figure(figsize=(40,40))
positions = nx.nx_pydot.graphviz_layout(G, prog='dot')
nx.draw(G, pos=positions, prog='dot', node_size=3400)
nx.draw_networkx_labels(G, positions, labels=node_dict, font_size=18)#, font_color='k', font_family='sans-serif', font_weight='normal', alpha=1.0, bbox=None, ax=None, **kwds)
#plt.savefig('tex/fig/graph_complete.png', transparent = True)
Out[24]:
{'/src': <matplotlib.text.Text at 0x14b72d240>,
 '/src/c': <matplotlib.text.Text at 0x14b724400>,
 '/src/c/include': <matplotlib.text.Text at 0x14b784940>,
 '/src/c/src': <matplotlib.text.Text at 0x14b77be80>,
 '/src/c/src/hashtable': <matplotlib.text.Text at 0x14bb1c320>,
 '/src/c/tests': <matplotlib.text.Text at 0x14b5d9940>,
 '/src/contrib': <matplotlib.text.Text at 0x14b40ff60>,
 '/src/contrib/bookkeeper': <matplotlib.text.Text at 0x14b836860>,
 '/src/contrib/bookkeeper/benchmark': <matplotlib.text.Text at 0x14b82fda0>,
 '/src/contrib/bookkeeper/benchmark/org': <matplotlib.text.Text at 0x14b885320>,
 '/src/contrib/bookkeeper/benchmark/org/apache': <matplotlib.text.Text at 0x14b805320>,
 '/src/contrib/bookkeeper/benchmark/org/apache/bookkeeper': <matplotlib.text.Text at 0x14bb28860>,
 '/src/contrib/bookkeeper/benchmark/org/apache/bookkeeper/benchmark': <matplotlib.text.Text at 0x14b7a6cc0>,
 '/src/contrib/bookkeeper/conf': <matplotlib.text.Text at 0x14b772cc0>,
 '/src/contrib/bookkeeper/src': <matplotlib.text.Text at 0x14b51b240>,
 '/src/contrib/bookkeeper/src/java': <matplotlib.text.Text at 0x14b794240>,
 '/src/contrib/bookkeeper/src/java/org': <matplotlib.text.Text at 0x14b7fe860>,
 '/src/contrib/bookkeeper/src/java/org/apache': <matplotlib.text.Text at 0x14b454a90>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper': <matplotlib.text.Text at 0x14b88b860>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/bookie': <matplotlib.text.Text at 0x14b5b5240>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/client': <matplotlib.text.Text at 0x14b5fbcc0>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/proto': <matplotlib.text.Text at 0x14b7cada0>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/streaming': <matplotlib.text.Text at 0x14b465a58>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/tools': <matplotlib.text.Text at 0x14bb5fda0>,
 '/src/contrib/bookkeeper/src/java/org/apache/bookkeeper/util': <matplotlib.text.Text at 0x14b8fd320>,
 '/src/contrib/bookkeeper/test': <matplotlib.text.Text at 0x14b7fe320>,
 '/src/contrib/bookkeeper/test/org': <matplotlib.text.Text at 0x14bb36860>,
 '/src/contrib/bookkeeper/test/org/apache': <matplotlib.text.Text at 0x14b59cb00>,
 '/src/contrib/bookkeeper/test/org/apache/bookkeeper': <matplotlib.text.Text at 0x14b83c860>,
 '/src/contrib/bookkeeper/test/org/apache/bookkeeper/test': <matplotlib.text.Text at 0x14b8e6860>,
 '/src/contrib/fatjar': <matplotlib.text.Text at 0x14b8ceda0>,
 '/src/contrib/fatjar/conf': <matplotlib.text.Text at 0x14bb16da0>,
 '/src/contrib/fatjar/src': <matplotlib.text.Text at 0x14b738780>,
 '/src/contrib/fatjar/src/java': <matplotlib.text.Text at 0x14b465ef0>,
 '/src/contrib/fatjar/src/java/org': <matplotlib.text.Text at 0x14b724080>,
 '/src/contrib/fatjar/src/java/org/apache': <matplotlib.text.Text at 0x14bb59860>,
 '/src/contrib/fatjar/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b81e320>,
 '/src/contrib/fatjar/src/java/org/apache/zookeeper/util': <matplotlib.text.Text at 0x14b404198>,
 '/src/contrib/hedwig': <matplotlib.text.Text at 0x14bb09da0>,
 '/src/contrib/hedwig/client': <matplotlib.text.Text at 0x14b8afda0>,
 '/src/contrib/hedwig/client/src': <matplotlib.text.Text at 0x14bb77da0>,
 '/src/contrib/hedwig/client/src/main': <matplotlib.text.Text at 0x14b7ad080>,
 '/src/contrib/hedwig/client/src/main/cpp': <matplotlib.text.Text at 0x14b8c1da0>,
 '/src/contrib/hedwig/client/src/main/cpp/inc': <matplotlib.text.Text at 0x14b87a320>,
 '/src/contrib/hedwig/client/src/main/cpp/inc/hedwig': <matplotlib.text.Text at 0x14b88b320>,
 '/src/contrib/hedwig/client/src/main/cpp/lib': <matplotlib.text.Text at 0x14b861320>,
 '/src/contrib/hedwig/client/src/main/cpp/m4': <matplotlib.text.Text at 0x14b5bf080>,
 '/src/contrib/hedwig/client/src/main/cpp/scripts': <matplotlib.text.Text at 0x14b79eb00>,
 '/src/contrib/hedwig/client/src/main/cpp/test': <matplotlib.text.Text at 0x14b7cfda0>,
 '/src/contrib/hedwig/client/src/main/java': <matplotlib.text.Text at 0x14b8c8da0>,
 '/src/contrib/hedwig/client/src/main/java/org': <matplotlib.text.Text at 0x14b5ae400>,
 '/src/contrib/hedwig/client/src/main/java/org/apache': <matplotlib.text.Text at 0x14bb11da0>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig': <matplotlib.text.Text at 0x14b5cf780>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client': <matplotlib.text.Text at 0x14b85a860>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/api': <matplotlib.text.Text at 0x14b890860>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/benchmark': <matplotlib.text.Text at 0x14b8ce860>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/conf': <matplotlib.text.Text at 0x14b79e080>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/data': <matplotlib.text.Text at 0x14b7cf320>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/exceptions': <matplotlib.text.Text at 0x14b872da0>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/handlers': <matplotlib.text.Text at 0x14b4aefd0>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/netty': <matplotlib.text.Text at 0x14b572940>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/client/ssl': <matplotlib.text.Text at 0x14b7db320>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/conf': <matplotlib.text.Text at 0x14b51b940>,
 '/src/contrib/hedwig/client/src/main/java/org/apache/hedwig/util': <matplotlib.text.Text at 0x14b8f8320>,
 '/src/contrib/hedwig/client/src/main/resources': <matplotlib.text.Text at 0x14b5bf400>,
 '/src/contrib/hedwig/client/src/test': <matplotlib.text.Text at 0x14b817da0>,
 '/src/contrib/hedwig/client/src/test/java': <matplotlib.text.Text at 0x14b7945c0>,
 '/src/contrib/hedwig/client/src/test/java/org': <matplotlib.text.Text at 0x14b5e1780>,
 '/src/contrib/hedwig/client/src/test/java/org/apache': <matplotlib.text.Text at 0x14b702b00>,
 '/src/contrib/hedwig/client/src/test/java/org/apache/hedwig': <matplotlib.text.Text at 0x14b8c8320>,
 '/src/contrib/hedwig/client/src/test/java/org/apache/hedwig/client': <matplotlib.text.Text at 0x14b769e80>,
 '/src/contrib/hedwig/client/src/test/java/org/apache/hedwig/util': <matplotlib.text.Text at 0x14b715e80>,
 '/src/contrib/hedwig/conf': <matplotlib.text.Text at 0x14b751940>,
 '/src/contrib/hedwig/doc': <matplotlib.text.Text at 0x14b7db860>,
 '/src/contrib/hedwig/protocol': <matplotlib.text.Text at 0x14bb09860>,
 '/src/contrib/hedwig/protocol/src': <matplotlib.text.Text at 0x14b51b5c0>,
 '/src/contrib/hedwig/protocol/src/main': <matplotlib.text.Text at 0x14bb46860>,
 '/src/contrib/hedwig/protocol/src/main/java': <matplotlib.text.Text at 0x14b7bbda0>,
 '/src/contrib/hedwig/protocol/src/main/java/org': <matplotlib.text.Text at 0x14b57a780>,
 '/src/contrib/hedwig/protocol/src/main/java/org/apache': <matplotlib.text.Text at 0x14bb03320>,
 '/src/contrib/hedwig/protocol/src/main/java/org/apache/hedwig': <matplotlib.text.Text at 0x14b5cf400>,
 '/src/contrib/hedwig/protocol/src/main/java/org/apache/hedwig/exceptions': <matplotlib.text.Text at 0x14b759b00>,
 '/src/contrib/hedwig/protocol/src/main/java/org/apache/hedwig/protoextensions': <matplotlib.text.Text at 0x14b738e80>,
 '/src/contrib/hedwig/protocol/src/main/protobuf': <matplotlib.text.Text at 0x14b896860>,
 '/src/contrib/hedwig/scripts': <matplotlib.text.Text at 0x14b8bb860>,
 '/src/contrib/hedwig/server': <matplotlib.text.Text at 0x14b7c3860>,
 '/src/contrib/hedwig/server/lib': <matplotlib.text.Text at 0x14b40f550>,
 '/src/contrib/hedwig/server/src': <matplotlib.text.Text at 0x14b7feda0>,
 '/src/contrib/hedwig/server/src/main': <matplotlib.text.Text at 0x14b7c3da0>,
 '/src/contrib/hedwig/server/src/main/java': <matplotlib.text.Text at 0x14b7eeda0>,
 '/src/contrib/hedwig/server/src/main/java/org': <matplotlib.text.Text at 0x14b8e6320>,
 '/src/contrib/hedwig/server/src/main/java/org/apache': <matplotlib.text.Text at 0x14bb40da0>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig': <matplotlib.text.Text at 0x14bb46da0>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server': <matplotlib.text.Text at 0x14b5bfb00>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/benchmark': <matplotlib.text.Text at 0x14b759e80>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/common': <matplotlib.text.Text at 0x14b79ee80>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/delivery': <matplotlib.text.Text at 0x14b7845c0>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/handlers': <matplotlib.text.Text at 0x14b5945c0>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/netty': <matplotlib.text.Text at 0x14b769400>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/persistence': <matplotlib.text.Text at 0x14b70b940>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/proxy': <matplotlib.text.Text at 0x14b73f5c0>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/regions': <matplotlib.text.Text at 0x14b4ae908>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/ssl': <matplotlib.text.Text at 0x14bb03860>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/subscriptions': <matplotlib.text.Text at 0x14b7ade80>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/server/topics': <matplotlib.text.Text at 0x14b73f940>,
 '/src/contrib/hedwig/server/src/main/java/org/apache/hedwig/zookeeper': <matplotlib.text.Text at 0x14b82f860>,
 '/src/contrib/hedwig/server/src/main/resources': <matplotlib.text.Text at 0x14b8d3da0>,
 '/src/contrib/hedwig/server/src/test': <matplotlib.text.Text at 0x14b7b7860>,
 '/src/contrib/hedwig/server/src/test/java': <matplotlib.text.Text at 0x14b569080>,
 '/src/contrib/hedwig/server/src/test/java/org': <matplotlib.text.Text at 0x14b4045c0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache': <matplotlib.text.Text at 0x14b83cda0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig': <matplotlib.text.Text at 0x14b57ab00>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/client': <matplotlib.text.Text at 0x14bb70860>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server': <matplotlib.text.Text at 0x14b5b5cc0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/delivery': <matplotlib.text.Text at 0x14b7c3320>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/handlers': <matplotlib.text.Text at 0x14b854da0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/integration': <matplotlib.text.Text at 0x14bb21da0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/netty': <matplotlib.text.Text at 0x14bb3a860>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/persistence': <matplotlib.text.Text at 0x14b866320>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/subscriptions': <matplotlib.text.Text at 0x14b8f8da0>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/server/topics': <matplotlib.text.Text at 0x14b79e400>,
 '/src/contrib/hedwig/server/src/test/java/org/apache/hedwig/zookeeper': <matplotlib.text.Text at 0x14bb83320>,
 '/src/contrib/huebrowser': <matplotlib.text.Text at 0x14b82f320>,
 '/src/contrib/huebrowser/zkui': <matplotlib.text.Text at 0x14bb7d860>,
 '/src/contrib/huebrowser/zkui/src': <matplotlib.text.Text at 0x14b58c080>,
 '/src/contrib/huebrowser/zkui/src/zkui': <matplotlib.text.Text at 0x14b8df860>,
 '/src/contrib/huebrowser/zkui/src/zkui/static': <matplotlib.text.Text at 0x14b77b780>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/art': <matplotlib.text.Text at 0x14b896320>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/css': <matplotlib.text.Text at 0x14b59ce80>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/help': <matplotlib.text.Text at 0x14b7d5320>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/js': <matplotlib.text.Text at 0x14b87a860>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/js/Source': <matplotlib.text.Text at 0x14b784cc0>,
 '/src/contrib/huebrowser/zkui/src/zkui/static/js/Source/Zkui': <matplotlib.text.Text at 0x14b87ada0>,
 '/src/contrib/huebrowser/zkui/src/zkui/templates': <matplotlib.text.Text at 0x14b81eda0>,
 '/src/contrib/loggraph': <matplotlib.text.Text at 0x14bb6bda0>,
 '/src/contrib/loggraph/bin': <matplotlib.text.Text at 0x14b78c780>,
 '/src/contrib/loggraph/src': <matplotlib.text.Text at 0x14bb16320>,
 '/src/contrib/loggraph/src/java': <matplotlib.text.Text at 0x14b761940>,
 '/src/contrib/loggraph/src/java/org': <matplotlib.text.Text at 0x14b794cc0>,
 '/src/contrib/loggraph/src/java/org/apache': <matplotlib.text.Text at 0x14b880320>,
 '/src/contrib/loggraph/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b57a080>,
 '/src/contrib/loggraph/src/java/org/apache/zookeeper/graph': <matplotlib.text.Text at 0x14b5d9240>,
 '/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/filterops': <matplotlib.text.Text at 0x14b7fa320>,
 '/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/servlets': <matplotlib.text.Text at 0x14b5ae780>,
 '/src/contrib/loggraph/web': <matplotlib.text.Text at 0x14b7f3860>,
 '/src/contrib/loggraph/web/org': <matplotlib.text.Text at 0x14b7d5860>,
 '/src/contrib/loggraph/web/org/apache': <matplotlib.text.Text at 0x14b5cf080>,
 '/src/contrib/loggraph/web/org/apache/zookeeper': <matplotlib.text.Text at 0x14b5725c0>,
 '/src/contrib/loggraph/web/org/apache/zookeeper/graph': <matplotlib.text.Text at 0x14b772940>,
 '/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources': <matplotlib.text.Text at 0x14b71e240>,
 '/src/contrib/monitoring': <matplotlib.text.Text at 0x14b8d9860>,
 '/src/contrib/monitoring/cacti': <matplotlib.text.Text at 0x14b7ad400>,
 '/src/contrib/monitoring/ganglia': <matplotlib.text.Text at 0x14bb46320>,
 '/src/contrib/monitoring/nagios': <matplotlib.text.Text at 0x14b7515c0>,
 '/src/contrib/rest': <matplotlib.text.Text at 0x14bb21860>,
 '/src/contrib/rest/conf': <matplotlib.text.Text at 0x14b7ee860>,
 '/src/contrib/rest/conf/keys': <matplotlib.text.Text at 0x14b59c080>,
 '/src/contrib/rest/lib': <matplotlib.text.Text at 0x14b746400>,
 '/src/contrib/rest/src': <matplotlib.text.Text at 0x14b746780>,
 '/src/contrib/rest/src/java': <matplotlib.text.Text at 0x14b5fb240>,
 '/src/contrib/rest/src/java/org': <matplotlib.text.Text at 0x14b7725c0>,
 '/src/contrib/rest/src/java/org/apache': <matplotlib.text.Text at 0x14b5c7cc0>,
 '/src/contrib/rest/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b86c320>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server': <matplotlib.text.Text at 0x14bb03da0>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server/jersey': <matplotlib.text.Text at 0x14b880860>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server/jersey/cfg': <matplotlib.text.Text at 0x14b8b5320>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server/jersey/filters': <matplotlib.text.Text at 0x14b569400>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server/jersey/jaxb': <matplotlib.text.Text at 0x14b8aa320>,
 '/src/contrib/rest/src/java/org/apache/zookeeper/server/jersey/resources': <matplotlib.text.Text at 0x14b73f240>,
 '/src/contrib/rest/src/python': <matplotlib.text.Text at 0x14b7b7da0>,
 '/src/contrib/rest/src/test': <matplotlib.text.Text at 0x14bb09320>,
 '/src/contrib/rest/src/test/org': <matplotlib.text.Text at 0x14b59c780>,
 '/src/contrib/rest/src/test/org/apache': <matplotlib.text.Text at 0x14bb5f320>,
 '/src/contrib/rest/src/test/org/apache/zookeeper': <matplotlib.text.Text at 0x14b8ce320>,
 '/src/contrib/rest/src/test/org/apache/zookeeper/server': <matplotlib.text.Text at 0x14b78cb00>,
 '/src/contrib/rest/src/test/org/apache/zookeeper/server/jersey': <matplotlib.text.Text at 0x14b79e780>,
 '/src/contrib/zkfuse': <matplotlib.text.Text at 0x14b57ae80>,
 '/src/contrib/zkfuse/src': <matplotlib.text.Text at 0x14bb4bda0>,
 '/src/contrib/zkperl': <matplotlib.text.Text at 0x14b5fb940>,
 '/src/contrib/zkperl/build': <matplotlib.text.Text at 0x14b5cfe80>,
 '/src/contrib/zkperl/t': <matplotlib.text.Text at 0x14b842da0>,
 '/src/contrib/zkpython': <matplotlib.text.Text at 0x14bb36da0>,
 '/src/contrib/zkpython/src': <matplotlib.text.Text at 0x14b5e1e80>,
 '/src/contrib/zkpython/src/c': <matplotlib.text.Text at 0x14b84cda0>,
 '/src/contrib/zkpython/src/examples': <matplotlib.text.Text at 0x14b724780>,
 '/src/contrib/zkpython/src/packages': <matplotlib.text.Text at 0x14b805da0>,
 '/src/contrib/zkpython/src/packages/deb': <matplotlib.text.Text at 0x14b715400>,
 '/src/contrib/zkpython/src/packages/deb/zkpython.control': <matplotlib.text.Text at 0x14b70b240>,
 '/src/contrib/zkpython/src/packages/rpm': <matplotlib.text.Text at 0x14b89d320>,
 '/src/contrib/zkpython/src/packages/rpm/spec': <matplotlib.text.Text at 0x14b5a5940>,
 '/src/contrib/zkpython/src/python': <matplotlib.text.Text at 0x14b58c780>,
 '/src/contrib/zkpython/src/test': <matplotlib.text.Text at 0x14b702080>,
 '/src/contrib/zktreeutil': <matplotlib.text.Text at 0x14bb7d320>,
 '/src/contrib/zktreeutil/src': <matplotlib.text.Text at 0x14b4bedd8>,
 '/src/contrib/zktreeutil/tests': <matplotlib.text.Text at 0x14b70b5c0>,
 '/src/contrib/zooinspector': <matplotlib.text.Text at 0x14bb2e320>,
 '/src/contrib/zooinspector/config': <matplotlib.text.Text at 0x14b424278>,
 '/src/contrib/zooinspector/icons': <matplotlib.text.Text at 0x14b8bb320>,
 '/src/contrib/zooinspector/lib': <matplotlib.text.Text at 0x14b8d3320>,
 '/src/contrib/zooinspector/licences': <matplotlib.text.Text at 0x14b7ca320>,
 '/src/contrib/zooinspector/src': <matplotlib.text.Text at 0x14b794940>,
 '/src/contrib/zooinspector/src/java': <matplotlib.text.Text at 0x14b8b5860>,
 '/src/contrib/zooinspector/src/java/com': <matplotlib.text.Text at 0x14b86cda0>,
 '/src/contrib/zooinspector/src/java/com/nitido': <matplotlib.text.Text at 0x14b805860>,
 '/src/contrib/zooinspector/src/java/com/nitido/utils': <matplotlib.text.Text at 0x14b7d5da0>,
 '/src/contrib/zooinspector/src/java/com/nitido/utils/toaster': <matplotlib.text.Text at 0x14b7adb00>,
 '/src/contrib/zooinspector/src/java/org': <matplotlib.text.Text at 0x14b569b00>,
 '/src/contrib/zooinspector/src/java/org/apache': <matplotlib.text.Text at 0x14b759780>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b78c080>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector': <matplotlib.text.Text at 0x14b81e860>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/encryption': <matplotlib.text.Text at 0x14b702780>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/gui': <matplotlib.text.Text at 0x14b715080>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/gui/actions': <matplotlib.text.Text at 0x14b8b5da0>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/gui/nodeviewer': <matplotlib.text.Text at 0x14b569e80>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/logger': <matplotlib.text.Text at 0x14b738400>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/inspector/manager': <matplotlib.text.Text at 0x14b5d95c0>,
 '/src/contrib/zooinspector/src/java/org/apache/zookeeper/retry': <matplotlib.text.Text at 0x14b822da0>,
 '/src/contrib/zooinspector/src/main': <matplotlib.text.Text at 0x14b80a860>,
 '/src/contrib/zooinspector/src/main/resources': <matplotlib.text.Text at 0x14b7fada0>,
 '/src/contrib/zooinspector/src/main/resources/icons': <matplotlib.text.Text at 0x14b842860>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango': <matplotlib.text.Text at 0x14b8a4860>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/16x16': <matplotlib.text.Text at 0x14b890da0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/16x16/actions': <matplotlib.text.Text at 0x14b594cc0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/16x16/mimetypes': <matplotlib.text.Text at 0x14b5d9cc0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/16x16/places': <matplotlib.text.Text at 0x14bb28da0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/16x16/status': <matplotlib.text.Text at 0x14b5c75c0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/22x22': <matplotlib.text.Text at 0x14b77b080>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/22x22/actions': <matplotlib.text.Text at 0x14b5a55c0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/22x22/mimetypes': <matplotlib.text.Text at 0x14b885860>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/22x22/places': <matplotlib.text.Text at 0x14b8fdda0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/22x22/status': <matplotlib.text.Text at 0x14bb83860>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/24x24': <matplotlib.text.Text at 0x14b5bfe80>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/24x24/actions': <matplotlib.text.Text at 0x14b51bcc0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/24x24/mimetypes': <matplotlib.text.Text at 0x14b80ada0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/24x24/places': <matplotlib.text.Text at 0x14b8edda0>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/24x24/status': <matplotlib.text.Text at 0x14b8a4320>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/32x32': <matplotlib.text.Text at 0x14bb11860>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/32x32/actions': <matplotlib.text.Text at 0x14b738b00>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/32x32/mimetypes': <matplotlib.text.Text at 0x14b5f2400>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/32x32/places': <matplotlib.text.Text at 0x14b77bb00>,
 '/src/contrib/zooinspector/src/main/resources/icons/Tango/32x32/status': <matplotlib.text.Text at 0x14b8a4da0>,
 '/src/docs': <matplotlib.text.Text at 0x14b8dfda0>,
 '/src/docs/src': <matplotlib.text.Text at 0x14bb6b860>,
 '/src/docs/src/documentation': <matplotlib.text.Text at 0x14b8ed320>,
 '/src/docs/src/documentation/classes': <matplotlib.text.Text at 0x14b738080>,
 '/src/docs/src/documentation/conf': <matplotlib.text.Text at 0x14b85a320>,
 '/src/docs/src/documentation/content': <matplotlib.text.Text at 0x14bb4b860>,
 '/src/docs/src/documentation/content/xdocs': <matplotlib.text.Text at 0x14b8e6da0>,
 '/src/docs/src/documentation/resources': <matplotlib.text.Text at 0x14bb2eda0>,
 '/src/docs/src/documentation/resources/images': <matplotlib.text.Text at 0x14b5086d8>,
 '/src/java': <matplotlib.text.Text at 0x14bb40320>,
 '/src/java/jmx': <matplotlib.text.Text at 0x14b812860>,
 '/src/java/jmx/com': <matplotlib.text.Text at 0x14b89dda0>,
 '/src/java/jmx/com/apache': <matplotlib.text.Text at 0x14b817320>,
 '/src/java/jmx/com/apache/zookeeper': <matplotlib.text.Text at 0x14b7a6240>,
 '/src/java/jmx/com/apache/zookeeper/jmx': <matplotlib.text.Text at 0x14b861da0>,
 '/src/java/jmx/com/apache/zookeeper/jmx/server': <matplotlib.text.Text at 0x14bb21320>,
 '/src/java/jmx/com/apache/zookeeper/jmx/server/quorum': <matplotlib.text.Text at 0x14b5c7940>,
 '/src/java/jmx/com/apache/zookeeper/server': <matplotlib.text.Text at 0x14b866860>,
 '/src/java/jmx/com/apache/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b594940>,
 '/src/java/jmx/com/apache/zookeeper/server/util': <matplotlib.text.Text at 0x14bb28320>,
 '/src/java/jmx/com/yahoo': <matplotlib.text.Text at 0x14b7bb860>,
 '/src/java/jmx/com/yahoo/zookeeper': <matplotlib.text.Text at 0x14b769780>,
 '/src/java/jmx/com/yahoo/zookeeper/jmx': <matplotlib.text.Text at 0x14b418c18>,
 '/src/java/jmx/com/yahoo/zookeeper/jmx/server': <matplotlib.text.Text at 0x14b5ae080>,
 '/src/java/jmx/com/yahoo/zookeeper/jmx/server/quorum': <matplotlib.text.Text at 0x14b872860>,
 '/src/java/jmx/com/yahoo/zookeeper/server': <matplotlib.text.Text at 0x14b784240>,
 '/src/java/jmx/com/yahoo/zookeeper/server/quorum': <matplotlib.text.Text at 0x14bb2e860>,
 '/src/java/jmx/com/yahoo/zookeeper/server/util': <matplotlib.text.Text at 0x14b8f3da0>,
 '/src/java/jmx/org': <matplotlib.text.Text at 0x14b7e7860>,
 '/src/java/jmx/org/apache': <matplotlib.text.Text at 0x14bb77860>,
 '/src/java/jmx/org/apache/zookeeper': <matplotlib.text.Text at 0x14b7615c0>,
 '/src/java/jmx/org/apache/zookeeper/jmx': <matplotlib.text.Text at 0x14b715780>,
 '/src/java/jmx/org/apache/zookeeper/jmx/server': <matplotlib.text.Text at 0x14b5fb5c0>,
 '/src/java/jmx/org/apache/zookeeper/jmx/server/quorum': <matplotlib.text.Text at 0x14b890320>,
 '/src/java/jmx/org/apache/zookeeper/server': <matplotlib.text.Text at 0x14b812da0>,
 '/src/java/jmx/org/apache/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b836320>,
 '/src/java/jmx/org/apache/zookeeper/server/util': <matplotlib.text.Text at 0x14b4aea90>,
 '/src/java/lib': <matplotlib.text.Text at 0x14b896da0>,
 '/src/java/lib/cobertura': <matplotlib.text.Text at 0x14bb5f860>,
 '/src/java/lib/cobertura/lib': <matplotlib.text.Text at 0x14bb59320>,
 '/src/java/lib/jdiff': <matplotlib.text.Text at 0x14b4c16a0>,
 '/src/java/lib/svnant': <matplotlib.text.Text at 0x14b861860>,
 '/src/java/libtest': <matplotlib.text.Text at 0x14b5e1400>,
 '/src/java/main': <matplotlib.text.Text at 0x14bb36320>,
 '/src/java/main/com': <matplotlib.text.Text at 0x14b71ecc0>,
 '/src/java/main/com/apache': <matplotlib.text.Text at 0x14b4842e8>,
 '/src/java/main/com/apache/jute': <matplotlib.text.Text at 0x14bb7dda0>,
 '/src/java/main/com/apache/jute/compiler': <matplotlib.text.Text at 0x14b5aeb00>,
 '/src/java/main/com/apache/jute/compiler/generated': <matplotlib.text.Text at 0x14b78ce80>,
 '/src/java/main/com/apache/zookeeper': <matplotlib.text.Text at 0x14b769080>,
 '/src/java/main/com/apache/zookeeper/server': <matplotlib.text.Text at 0x14b746080>,
 '/src/java/main/com/apache/zookeeper/server/auth': <matplotlib.text.Text at 0x14b572240>,
 '/src/java/main/com/apache/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b7e7da0>,
 '/src/java/main/com/apache/zookeeper/server/util': <matplotlib.text.Text at 0x14b769b00>,
 '/src/java/main/com/apache/zookeeper/version': <matplotlib.text.Text at 0x14b8d9320>,
 '/src/java/main/com/apache/zookeeper/version/util': <matplotlib.text.Text at 0x14b5825c0>,
 '/src/java/main/com/yahoo': <matplotlib.text.Text at 0x14b88bda0>,
 '/src/java/main/com/yahoo/jute': <matplotlib.text.Text at 0x14b7e0860>,
 '/src/java/main/com/yahoo/jute/compiler': <matplotlib.text.Text at 0x14b702400>,
 '/src/java/main/com/yahoo/jute/compiler/generated': <matplotlib.text.Text at 0x14b836da0>,
 '/src/java/main/com/yahoo/zookeeper': <matplotlib.text.Text at 0x14b8f8860>,
 '/src/java/main/com/yahoo/zookeeper/server': <matplotlib.text.Text at 0x14b72dcc0>,
 '/src/java/main/com/yahoo/zookeeper/server/auth': <matplotlib.text.Text at 0x14b5b55c0>,
 '/src/java/main/com/yahoo/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b847da0>,
 '/src/java/main/com/yahoo/zookeeper/server/util': <matplotlib.text.Text at 0x14b8d9da0>,
 '/src/java/main/com/yahoo/zookeeper/version': <matplotlib.text.Text at 0x14b5b5940>,
 '/src/java/main/com/yahoo/zookeeper/version/util': <matplotlib.text.Text at 0x14b822860>,
 '/src/java/main/org': <matplotlib.text.Text at 0x14b59c400>,
 '/src/java/main/org/apache': <matplotlib.text.Text at 0x14b847320>,
 '/src/java/main/org/apache/jute': <matplotlib.text.Text at 0x14bb53860>,
 '/src/java/main/org/apache/jute/compiler': <matplotlib.text.Text at 0x14b5cfb00>,
 '/src/java/main/org/apache/jute/compiler/generated': <matplotlib.text.Text at 0x14b5e1b00>,
 '/src/java/main/org/apache/zookeeper': <matplotlib.text.Text at 0x14b582940>,
 '/src/java/main/org/apache/zookeeper/admin': <matplotlib.text.Text at 0x14b7fa860>,
 '/src/java/main/org/apache/zookeeper/cli': <matplotlib.text.Text at 0x14b724e80>,
 '/src/java/main/org/apache/zookeeper/client': <matplotlib.text.Text at 0x14b746b00>,
 '/src/java/main/org/apache/zookeeper/common': <matplotlib.text.Text at 0x14b70bcc0>,
 '/src/java/main/org/apache/zookeeper/jmx': <matplotlib.text.Text at 0x14b71e940>,
 '/src/java/main/org/apache/zookeeper/server': <matplotlib.text.Text at 0x14b7a6940>,
 '/src/java/main/org/apache/zookeeper/server/admin': <matplotlib.text.Text at 0x14bb3ada0>,
 '/src/java/main/org/apache/zookeeper/server/auth': <matplotlib.text.Text at 0x14b4c12e8>,
 '/src/java/main/org/apache/zookeeper/server/command': <matplotlib.text.Text at 0x14b572cc0>,
 '/src/java/main/org/apache/zookeeper/server/persistence': <matplotlib.text.Text at 0x14b7f3320>,
 '/src/java/main/org/apache/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b85ada0>,
 '/src/java/main/org/apache/zookeeper/server/quorum/auth': <matplotlib.text.Text at 0x14b8c8860>,
 '/src/java/main/org/apache/zookeeper/server/quorum/flexible': <matplotlib.text.Text at 0x14b7a65c0>,
 '/src/java/main/org/apache/zookeeper/server/upgrade': <matplotlib.text.Text at 0x14b8df320>,
 '/src/java/main/org/apache/zookeeper/server/util': <matplotlib.text.Text at 0x14b5e85c0>,
 '/src/java/main/org/apache/zookeeper/util': <matplotlib.text.Text at 0x14b5f2b00>,
 '/src/java/main/org/apache/zookeeper/version': <matplotlib.text.Text at 0x14b7bb320>,
 '/src/java/main/org/apache/zookeeper/version/util': <matplotlib.text.Text at 0x14b872320>,
 '/src/java/systest': <matplotlib.text.Text at 0x14b7e0320>,
 '/src/java/systest/org': <matplotlib.text.Text at 0x14b7cf860>,
 '/src/java/systest/org/apache': <matplotlib.text.Text at 0x14b8af860>,
 '/src/java/systest/org/apache/zookeeper': <matplotlib.text.Text at 0x14bb53320>,
 '/src/java/systest/org/apache/zookeeper/test': <matplotlib.text.Text at 0x14b582cc0>,
 '/src/java/systest/org/apache/zookeeper/test/system': <matplotlib.text.Text at 0x14bb1cda0>,
 '/src/java/test': <matplotlib.text.Text at 0x14b569780>,
 '/src/java/test/bin': <matplotlib.text.Text at 0x14b761cc0>,
 '/src/java/test/com': <matplotlib.text.Text at 0x14b80a320>,
 '/src/java/test/com/apache': <matplotlib.text.Text at 0x14b5f2080>,
 '/src/java/test/com/apache/zookeeper': <matplotlib.text.Text at 0x14b880da0>,
 '/src/java/test/com/apache/zookeeper/server': <matplotlib.text.Text at 0x14b594240>,
 '/src/java/test/com/apache/zookeeper/test': <matplotlib.text.Text at 0x14b7ad780>,
 '/src/java/test/com/yahoo': <matplotlib.text.Text at 0x14b83c320>,
 '/src/java/test/com/yahoo/zookeeper': <matplotlib.text.Text at 0x14b58cb00>,
 '/src/java/test/com/yahoo/zookeeper/server': <matplotlib.text.Text at 0x14bb11320>,
 '/src/java/test/com/yahoo/zookeeper/test': <matplotlib.text.Text at 0x14b746e80>,
 '/src/java/test/config': <matplotlib.text.Text at 0x14bb16860>,
 '/src/java/test/data': <matplotlib.text.Text at 0x14b454438>,
 '/src/java/test/data/buffersize': <matplotlib.text.Text at 0x14b7ee320>,
 '/src/java/test/data/buffersize/create': <matplotlib.text.Text at 0x14b715b00>,
 '/src/java/test/data/buffersize/create/version-2': <matplotlib.text.Text at 0x14b759080>,
 '/src/java/test/data/buffersize/set': <matplotlib.text.Text at 0x14bb77320>,
 '/src/java/test/data/buffersize/set/version-2': <matplotlib.text.Text at 0x14b5c7240>,
 '/src/java/test/data/buffersize/snapshot': <matplotlib.text.Text at 0x14bb3a320>,
 '/src/java/test/data/buffersize/snapshot/version-2': <matplotlib.text.Text at 0x14b582240>,
 '/src/java/test/data/invalidsnap': <matplotlib.text.Text at 0x14b7b7320>,
 '/src/java/test/data/invalidsnap/version-2': <matplotlib.text.Text at 0x14b78c400>,
 '/src/java/test/data/kerberos': <matplotlib.text.Text at 0x14bb64860>,
 '/src/java/test/data/ssl': <matplotlib.text.Text at 0x14bb70320>,
 '/src/java/test/data/upgrade': <matplotlib.text.Text at 0x14b8d3860>,
 '/src/java/test/org': <matplotlib.text.Text at 0x14b86c860>,
 '/src/java/test/org/apache': <matplotlib.text.Text at 0x14b8fd860>,
 '/src/java/test/org/apache/jute': <matplotlib.text.Text at 0x14b822320>,
 '/src/java/test/org/apache/zookeeper': <matplotlib.text.Text at 0x14b759400>,
 '/src/java/test/org/apache/zookeeper/client': <matplotlib.text.Text at 0x14b854320>,
 '/src/java/test/org/apache/zookeeper/common': <matplotlib.text.Text at 0x14b4ae630>,
 '/src/java/test/org/apache/zookeeper/server': <matplotlib.text.Text at 0x14b84c320>,
 '/src/java/test/org/apache/zookeeper/server/admin': <matplotlib.text.Text at 0x14b8aa860>,
 '/src/java/test/org/apache/zookeeper/server/quorum': <matplotlib.text.Text at 0x14b885da0>,
 '/src/java/test/org/apache/zookeeper/server/quorum/auth': <matplotlib.text.Text at 0x14b84c860>,
 '/src/java/test/org/apache/zookeeper/server/util': <matplotlib.text.Text at 0x14b8bbda0>,
 '/src/java/test/org/apache/zookeeper/test': <matplotlib.text.Text at 0x14b58c400>,
 '/src/packages': <matplotlib.text.Text at 0x14b751cc0>,
 '/src/packages/deb': <matplotlib.text.Text at 0x14bb53da0>,
 '/src/packages/deb/init.d': <matplotlib.text.Text at 0x14b812320>,
 '/src/packages/deb/zookeeper.control': <matplotlib.text.Text at 0x14b8aada0>,
 '/src/packages/rpm': <matplotlib.text.Text at 0x14b89d860>,
 '/src/packages/rpm/init.d': <matplotlib.text.Text at 0x14b465400>,
 '/src/packages/rpm/spec': <matplotlib.text.Text at 0x14b77b400>,
 '/src/packages/templates': <matplotlib.text.Text at 0x14b7e0da0>,
 '/src/packages/templates/conf': <matplotlib.text.Text at 0x14b72d5c0>,
 '/src/recipes': <matplotlib.text.Text at 0x14bb1c860>,
 '/src/recipes/election': <matplotlib.text.Text at 0x14bb64320>,
 '/src/recipes/election/src': <matplotlib.text.Text at 0x14b854860>,
 '/src/recipes/election/src/java': <matplotlib.text.Text at 0x14b58ce80>,
 '/src/recipes/election/src/java/org': <matplotlib.text.Text at 0x14b761240>,
 '/src/recipes/election/src/java/org/apache': <matplotlib.text.Text at 0x14b5e1080>,
 '/src/recipes/election/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b866da0>,
 '/src/recipes/election/src/java/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b829da0>,
 '/src/recipes/election/src/java/org/apache/zookeeper/recipes/leader': <matplotlib.text.Text at 0x14b829860>,
 '/src/recipes/election/test': <matplotlib.text.Text at 0x14b8c1320>,
 '/src/recipes/election/test/org': <matplotlib.text.Text at 0x14b4242b0>,
 '/src/recipes/election/test/org/apache': <matplotlib.text.Text at 0x14bb59da0>,
 '/src/recipes/election/test/org/apache/zookeeper': <matplotlib.text.Text at 0x14b7dbda0>,
 '/src/recipes/election/test/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b5f2780>,
 '/src/recipes/election/test/org/apache/zookeeper/recipes/leader': <matplotlib.text.Text at 0x14bb64da0>,
 '/src/recipes/lock': <matplotlib.text.Text at 0x14b8af320>,
 '/src/recipes/lock/src': <matplotlib.text.Text at 0x14b817860>,
 '/src/recipes/lock/src/c': <matplotlib.text.Text at 0x14b5a5240>,
 '/src/recipes/lock/src/c/include': <matplotlib.text.Text at 0x14b8c1860>,
 '/src/recipes/lock/src/c/src': <matplotlib.text.Text at 0x14b8f3860>,
 '/src/recipes/lock/src/c/tests': <matplotlib.text.Text at 0x14b4261d0>,
 '/src/recipes/lock/src/java': <matplotlib.text.Text at 0x14b71e5c0>,
 '/src/recipes/lock/src/java/org': <matplotlib.text.Text at 0x14b5aee80>,
 '/src/recipes/lock/src/java/org/apache': <matplotlib.text.Text at 0x14bb4b320>,
 '/src/recipes/lock/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b751240>,
 '/src/recipes/lock/src/java/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b1db748>,
 '/src/recipes/lock/src/java/org/apache/zookeeper/recipes/lock': <matplotlib.text.Text at 0x14b57a400>,
 '/src/recipes/lock/test': <matplotlib.text.Text at 0x14b5bf780>,
 '/src/recipes/lock/test/org': <matplotlib.text.Text at 0x14b8f3320>,
 '/src/recipes/lock/test/org/apache': <matplotlib.text.Text at 0x14b73fcc0>,
 '/src/recipes/lock/test/org/apache/zookeeper': <matplotlib.text.Text at 0x14b7e7320>,
 '/src/recipes/lock/test/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b702e80>,
 '/src/recipes/lock/test/org/apache/zookeeper/recipes/lock': <matplotlib.text.Text at 0x14bb6b320>,
 '/src/recipes/queue': <matplotlib.text.Text at 0x14bb40860>,
 '/src/recipes/queue/src': <matplotlib.text.Text at 0x14b724b00>,
 '/src/recipes/queue/src/c': <matplotlib.text.Text at 0x14b5a5cc0>,
 '/src/recipes/queue/src/c/include': <matplotlib.text.Text at 0x14b5e8cc0>,
 '/src/recipes/queue/src/c/src': <matplotlib.text.Text at 0x14b842320>,
 '/src/recipes/queue/src/c/tests': <matplotlib.text.Text at 0x14b829320>,
 '/src/recipes/queue/src/java': <matplotlib.text.Text at 0x14b772240>,
 '/src/recipes/queue/src/java/org': <matplotlib.text.Text at 0x14bb70da0>,
 '/src/recipes/queue/src/java/org/apache': <matplotlib.text.Text at 0x14b5e8940>,
 '/src/recipes/queue/src/java/org/apache/zookeeper': <matplotlib.text.Text at 0x14b7ca860>,
 '/src/recipes/queue/src/java/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b5f2e80>,
 '/src/recipes/queue/src/java/org/apache/zookeeper/recipes/queue': <matplotlib.text.Text at 0x14b5e8240>,
 '/src/recipes/queue/test': <matplotlib.text.Text at 0x14b47a2e8>,
 '/src/recipes/queue/test/org': <matplotlib.text.Text at 0x14b454ac8>,
 '/src/recipes/queue/test/org/apache': <matplotlib.text.Text at 0x14b426e10>,
 '/src/recipes/queue/test/org/apache/zookeeper': <matplotlib.text.Text at 0x14b7f3da0>,
 '/src/recipes/queue/test/org/apache/zookeeper/recipes': <matplotlib.text.Text at 0x14b847860>,
 '/src/recipes/queue/test/org/apache/zookeeper/recipes/queue': <matplotlib.text.Text at 0x14b72d940>,
 'root': <matplotlib.text.Text at 0x14b8ed860>}
In [25]:
dir()
del df_commits, df_commit_lines

Merge authors with filenames

Code below is even more unstructured and inefficient than before

Get all dataframes to associate people with added files (file modifications are not considered).

In [26]:
df_added_files = pd.DataFrame(list(db.file_action.find({'mode':'A'}, {'file_id':1, 'commit_id':1, '_id':0})))
df_people = pd.DataFrame(list(db.people.find({},{'name':1})))
df_commit = pd.DataFrame(list(db.commit.find({},{'author_id':1})))
df_file_paths = pd.DataFrame(list(db.file.find({},{'path':1})))

Merge dataframes to receive df_files_people that contains only people and file paths and select only main contributors.

In [27]:
df_commits_people = pd.merge(df_commit, df_people, left_on='author_id', right_on='_id').replace(people_dict).drop(['_id_y', 'author_id'], axis = 1)
df_files_people = pd.merge(df_added_files, df_commits_people, left_on='commit_id', right_on='_id_x').drop(['commit_id','_id_x'], axis=1)
In [28]:
df_files_people = pd.merge(df_files_people, df_file_paths, left_on='file_id', right_on='_id').drop(['file_id', '_id'], axis=1)
df_files_people['count'] = pd.Series(np.ones(len(df_files_people['name'])), index=df_files_people.index)
In [29]:
contributors = df_files_people[df_files_people['name'].isin(['Rakesh Radhakrishnan', 'Patrick Hunt', 'Mahadev Konar'])]
In [30]:
contributors['languages'] = [path.rsplit('.', 1)[-1] for path in contributors['path']]
//anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

Group languages by main three contributors and sum up added files for each language.

In [31]:
groups = contributors.groupby('name')
mk = groups.get_group('Mahadev Konar').drop(['name', 'path'], axis=1).groupby('languages').sum()
ph = groups.get_group('Patrick Hunt').drop(['name', 'path'], axis=1).groupby('languages').sum()
rr = groups.get_group('Rakesh Radhakrishnan').drop(['name', 'path'], axis=1).groupby('languages').sum()
mk.columns = ['Mahakev Konar']
ph.columns = ['Patrick Hunt']
rr.columns = ['Rakesh Radhakrishnan']
In [32]:
ax = pd.concat([mk, ph, rr], axis=1, join_axes=[mk.index]).loc[['java', 'cpp','h', 'py', 'txt', 'xml']].plot.bar(figsize=(24,12))
ax.legend(loc=1,prop={'size':35})
ax.tick_params(axis='both', which='major', labelsize=30)
plt.yscale('log')
plt.ylabel('Number of added files', fontsize = 40)
plt.xlabel('Languages', fontsize=40)
plt.title('Most added file types', fontsize=48)
plt.tight_layout()
#plt.savefig('tex/fig/languages.png', transparent=False)

Most number of added files for directories

In [33]:
contributors['short_path'] = ['/' + path.rsplit('/', 1)[0] for path in contributors['path']]
//anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
In [34]:
df_path_names = contributors.drop(['path', 'languages'], axis=1)
df_path_names = df_path_names.groupby(['short_path', 'name']).count()
In [35]:
node_colors = np.zeros(len(G.nodes()))
i = 0
for path in G.nodes():
    node_colors[i] = 3
    try: 
        curMax = 0
        if (df_path_names.loc[path, 'Mahadev Konar']['count'] > curMax):
            curMax = df_path_names.loc[path, 'Mahadev Konar']['count']
            node_colors[i] = 0;
        if (df_path_names.loc[path, 'Patrick Hunt']['count'] > curMax):
            curMax = df_path_names.loc[path, 'Patrick Hunt']['count']
            node_colors[i] = 1;
        if (df_path_names.loc['/src/java/test/org/apache/zookeeper', 'Rakesh Radhakrishnan']['count'] > curMax):
            curMax = df_path_names.loc[path, 'Rakesh Radhakrishnan']['count']
            node_colors[i] = 2;
    except KeyError:
        ;
        #print('No commits for this folder by the three persons')
    i += 1
In [36]:
# Plot graph with labels as tree structure and save to png
plt.figure(figsize=(150,150))
positions = nx.nx_pydot.graphviz_layout(G, prog='dot')
plt.set_cmap('Set1')
nx.draw(G, pos=positions, prog='dot', node_size=12000, node_color=node_colors, vmax=9, width=4)
nx.draw_networkx_labels(G, positions, labels=node_dict, font_size=35)
plt.savefig('tex/fig/graph_complete.png', transparent = True)

Unfinished:

Do regression for number of added lines based on author, time added, date added, language used

In [37]:
df_people = pd.DataFrame.from_dict(list(db.people.find({},{'email':0, 'username':0})))
df_commits = pd.DataFrame.from_dict(list(db.commit.find({},{'author_id':1, '_id':1, 'author_date':1})))
df_lines = pd.DataFrame.from_dict(list(db.file_action.find({},{'lines_added':1, 'lines_deleted': 1, 'commit_id':1, '_id':0}))).groupby('commit_id', as_index = False).sum()

df_commit_lines = pd.merge(df_commits, df_lines, left_on='_id', right_on='commit_id')
df_commit_lines['num_commits'] = pd.Series(np.ones(len(df_commit_lines['author_id'])), index=df_commit_lines.index)
df_commit_lines = df_commit_lines.groupby('author_id', as_index=False).sum()

df_people_all = pd.merge(df_commit_lines, df_people, left_on='author_id', right_on='_id').drop(['author_id', '_id'], axis=1)

people_dict = {"Raúl Gutiérrez Segalés":"Raul Segales", 
               "Raul Gutierrez Segales": "Raul Segales", 
               "Raul Gutierrez S":"Raul Segales",
               "Patrick D. Hunt": "Patrick Hunt",
               "fpj":"Flavio Paiva Junqueira"}

df_people_all.replace(people_dict, inplace=True)
#df_people_all = df_people_all.groupby('name', as_index = False).sum()
#df_people_all = df_people_all.drop(0)
#df_people_all.set_index('name', inplace=True)

Who wrote how many messages

In [38]:
df_messages = pd.DataFrame(list(db.message.find({},{'from_id':1})))
df_people_all = pd.merge(df_commit_lines, df_people, left_on='author_id', right_on='_id').drop(['_id'], axis=1)
first_slide = df_people_all.merge(df_messages.groupby('from_id', as_index = False).count(), left_on='author_id', right_on='from_id').drop(['author_id', 'from_id'], axis=1)
In [39]:
first_slide.columns = ['Lines added', 'Lines deleted', 'Number commits', 'Example persons', 'Number messages']
first_slide.set_index('Example persons', inplace=True)
In [40]:
ax = first_slide.plot(rot=90, figsize=(24,14), linewidth=5)
plt.xticks(range(0,11), list(first_slide.index))
ax.legend(loc=1,prop={'size':35})
ax.tick_params(axis='both', which='major', labelsize=30)
ax.set_yscale('log')
ax.set_ylabel('Attribute count (log)', fontsize = 40)
ax.set_xlabel('', fontsize = 40)
locs, labels = plt.xticks()
plt.tight_layout()
plt.savefig('tex/fig/first_slide.png')